Overview
The ChartsMaze EDL (Enterprise Data Lake) Pipeline is a comprehensive data integration system for NSE stock market data. With a single command, it produces a complete dataset of 2,775 stocks with 86 fields per stock, combining fundamental analysis, technical indicators, corporate events, and real-time market intelligence.Quick Start
Get your first pipeline run in under 5 minutes
Installation
Set up Python environment and dependencies
Field Reference
Explore all 86 output fields and data sources
Pipeline Settings
Customize pipeline behavior and output
What You Get
The pipeline outputs a single compressed file:all_stocks_fundamental_analysis.json.gz
Output Highlights
- 2,775 NSE stocks with complete coverage
- 86 fields per stock across 13 categories
- ~2-4 MB compressed (from 30+ MB raw JSON)
- 4-8 minute runtime (excluding optional OHLCV fetch)
- Single command execution via
run_full_pipeline.py
Key Features
Comprehensive Fundamentals
Quarterly results, P/E ratios, ROE, ROCE, sales growth, profit margins, and 5-year trends
Advanced Technical Indicators
RSI, MACD, SMA/EMA status, pivot points, ADR, RVOL, ATH tracking, and volume analysis
Corporate Events Tracking
Dividends, bonus issues, stock splits, rights issues, and upcoming results announcements
Regulatory Filings
Company filings via hybrid LODR + Legacy endpoints with deduplication
Real-Time News Feed
AI-sentiment tagged news (50 articles per stock) from live market sources
Market Intelligence
ASM/GSM surveillance lists, circuit stocks, bulk/block deals, and price band revisions
Historical OHLCV Data
Lifetime daily candles with smart incremental updates (optional, ~30 min first run)
Post-Earnings Analytics
Returns since earnings, max gains post-results, and quarterly performance tracking
F&O Enrichment
F&O flag, lot sizes, and next expiry dates for derivatives-enabled stocks
Automated Dependency Management
18-script pipeline with strict phase ordering and error resilience
Pipeline Architecture
The pipeline operates in 6 phases with strict dependency ordering:Phase 1: Core Data
Foundation layer - fetches 2,775 stocks and creates master ISIN map
fetch_dhan_data.py→master_isin_map.json+dhan_data_response.jsonfetch_fundamental_data.py→fundamental_data.json(35 MB)
Phase 2: Data Enrichment
Parallel fetch of 10+ data sources (company filings, announcements, indicators, news, corporate actions, surveillance lists, etc.)
Phase 2.5: OHLCV History (Optional)
Smart incremental download of lifetime daily candles
- First run: ~30 minutes
- Incremental updates: 2-5 minutes
Phase 3: Base Analysis
bulk_market_analyzer.py builds the master JSON structure with 60+ base fieldsPhase 4: Enrichment
Sequential in-place modification of master JSON (order matters!)
- Advanced metrics (ADR, RVOL, ATH, Turnover)
- Earnings performance (post-results returns)
- F&O data (lot sizes, expiry dates)
- Corporate events + news feed (MUST BE LAST)
Data Categories (86 Fields)
Identity & Classification (6 fields)
Identity & Classification (6 fields)
- Symbol, Name, Listing Date
- Basic Industry, Sector, Index
Fundamentals (35 fields)
Fundamentals (35 fields)
- Quarterly results: Net Profit, EPS, Sales, OPM (Latest, Previous, 2Q, 3Q, Last Year)
- QoQ % and YoY % for all metrics
- Sales Growth 5 Years, EPS history
Valuation Ratios (10 fields)
Valuation Ratios (10 fields)
- Market Cap, Stock Price, P/E, Forward P/E, Historical P/E 5
- PEG, ROE, ROCE, D/E, OPM TTM
Ownership (4 fields)
Ownership (4 fields)
- FII % change QoQ, DII % change QoQ
- Free Float %, Float Shares
Technical Indicators (7 fields)
Technical Indicators (7 fields)
- RSI (14), SMA Status (20, 50, 200), EMA Status (20, 200)
- Technical Sentiment, Pivot Point
Price Performance (9 fields)
Price Performance (9 fields)
- Returns: 1 Day, 1 Week, 1 Month, 3M, 6M, 1Y
- % from 52W High/Low, % from ATH, Gap Up %, Day Range
Volume & Liquidity (6 fields)
Volume & Liquidity (6 fields)
- RVOL (Relative Volume vs 20-day avg)
- 200 Days EMA Volume, % from 52W High Volume
- Daily Rupee Turnover (20/50/100 day MA)
- 30 Days Average Rupee Volume
Volatility (4 fields)
Volatility (4 fields)
- 5/14/20/30 Days MA ADR (Average Daily Range)
Circuit & Price Bands (1 field)
Circuit & Price Bands (1 field)
- Current circuit limit (e.g., 2%, 5%, 10%, 20%)
Earnings Tracking (3 fields)
Earnings Tracking (3 fields)
- Quarterly Results Date
- Returns since Earnings %, Max Returns since Earnings %
Event Markers (1 field, multi-value)
Event Markers (1 field, multi-value)
9 event types with icons and dates:
- ★ LTASM/STASM | 📊 Results Recently Out | 🔑 Insider Trading
- 📦 Block Deal | # +/- Revision | ⏰ Results (DD-Mon)
- 🎁 Bonus | ✂️ Split | 💸 Dividend | 📈 Rights
Recent Announcements (1 field, array)
Recent Announcements (1 field, array)
Top 5 regulatory filings with Date, Headline, PDF URL
News Feed (1 field, array)
News Feed (1 field, array)
Top 5 real-time news with Title, Sentiment, Date
Runtime Performance
Typical execution times (2024 MacBook Pro M1, 100 Mbps internet):
- Without OHLCV: 4-6 minutes
- With OHLCV (incremental): 6-10 minutes
- With OHLCV (first run): 30-40 minutes
Output Files
Primary Output
Optional Outputs (if FETCH_OPTIONAL = True)
Intermediate Files (auto-cleaned if CLEANUP_INTERMEDIATE = True)
Next Steps
Run Your First Pipeline
Follow the quickstart guide to produce your first dataset
Install Dependencies
Set up Python and required packages